Unit selection based text-to-speech synthesis (TTS) has been the dominant TTS approach of the last decade. Despite\nits success, unit selection approach has its disadvantages. One of the most significant disadvantages is the sudden\ndiscontinuities in speech that distract the listeners (Speech Commun 51:1039ââ?¬â??1064, 2009). The second disadvantage\nis that significant expertise and large amounts of data is needed for building a high-quality synthesis system which is\ncostly and time-consuming. The statistical speech synthesis (SSS) approach is a promising alternative synthesis\ntechnique. Not only that the spurious errors that are observed in the unit selection system are mostly not observed in\nSSS but also building voice models is far less expensive and faster compared to the unit selection system. However,\nthe resulting speech is typically not as natural-sounding as speech that is synthesized with a high-quality unit\nselection system. There are hybrid methods that attempt to take advantage of both SSS and unit selection systems.\nHowever, existing hybrid methods still require development of a high-quality unit selection system. Here, we propose\na novel hybrid statistical/unit selection system for Turkish that aims at improving the quality of the baseline SSS\nsystem by improving the prosodic parameters such as intonation and stress. Commonly occurring suffixes in Turkish\nare stored in the unit selection database and used in the proposed system. As opposed to existing hybrid systems, the\nproposed system was developed without building a complete unit selection synthesis system. Therefore, the\nproposed method can be used without collecting large amounts of data or utilizing substantial expertise or\ntime-consuming tuning that is typically required in building unit selection systems. Listeners preferred the hybrid\nsystem over the baseline system in the AB preference tests
Loading....